{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Convert Natural Language to SQL Query with ReAct Agent\n",
    "\n",
    "This notebook will show \n",
    "- how to use ReAct agent to convert natural language to SQL query\n",
    "- how to use a customized NL2SQL algorithm, DAIL-SQL, in AgentScope \n",
    "\n",
    "## Background\n",
    "\n",
    "NL2SQl is a challenging task in both database and natural language processing communities. It aims to convert natural language question into SQL query. In research community, there are many works exploring the potential of LLMs in NL2SQL, and it would be very interesting to explore this task with LLM-empowered agents. \n",
    "\n",
    "In this notebook, we try to equip the ReAct agent with a SOTA NL2SQL algorithm, DAIL-SQL, to see if the agent-level reasoning could bring any insights to this task. The implementation of DAIL-SQL refer to its [GitHub repository](https://github.com/BeachWang/DAIL-SQL) and [paper](https://arxiv.org/abs/2308.15363).\n",
    "\n",
    "## Prerequisites\n",
    "\n",
    "- Follow [READMD.md](https://github.com/modelscope/agentscope) to install AgentScope.\n",
    "- Install the third-party libraries used in [DAIL-SQL](https://github.com/BeachWang/DAIL-SQL/blob/main/requirements.txt).\n",
    "- Prepare a model configuration. AgentScope supports both local deployed model services (CPU or GPU) and third-party services. More details and example model configurations please refer to our [tutorial](https://modelscope.github.io/agentscope/en/tutorial/203-model.html).\n",
    "- Get familiar with service functions, ServiceToolkit module and build-in ReAct agent. The corresponding example can be found in [ReAct agent example](../conversation_with_react_agent/main.ipynb).\n",
    "\n",
    "## Note\n",
    "\n",
    "- The example is tested with the following models. For other models, you may need to adjust the prompt.\n",
    "    - gpt-4 (The model that [DAIL-SQL](https://github.com/BeachWang/DAIL-SQL) is built on)\n",
    "    - gpt-3.5-turbo   "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Initializing model config and SQL database\n",
    "First, we initalize the agentscope config and load the corresponding model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Callable\n",
    "import agentscope\n",
    "agentscope.init(\n",
    "    model_configs=\"./configs/model_configs.json\",\n",
    "    project=\"Conversation with NL2SQL\",\n",
    ")\n",
    "from agentscope.manager import ModelManager\n",
    "model_manager = ModelManager.get_instance()\n",
    "loaded_model = model_manager.get_model_by_config_name('gpt-4')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, we init and provide the corresponding database path in sqlite file format.\n",
    "Here we generate the sqlite file using the provided sql commands [here](./database/concert_singer/schema.sql). \n",
    "You can also use the .sqlite format file directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sql_utils import create_sqlite_db_from_schema\n",
    "example_db_id = \"concert_singer\"\n",
    "db_schema_path = \"./database/concert_singer/schema.sql\"\n",
    "db_sqlite_path = \"./database/concert_singer/concert_singer.sqlite\"\n",
    "create_sqlite_db_from_schema(db_schema_path, db_sqlite_path)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can show a general description of the database schema using the code below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The database consists of four tables: \n",
      "\n",
      "1. Stadium: This table includes the following fields: Stadium_ID (which is the primary key), Location, Name, Capacity, Highest, Lowest, and Average.\n",
      "\n",
      "2. Singer: This table includes the following fields: Singer_ID (which is the primary key), Name, Country, Song_Name, Song_release_year, Age, and Is_male (a Boolean field that indicates the singer's gender).\n",
      "\n",
      "3. Concert: This table is linked to the Stadium table via the Stadium_ID field. The table includes the fields concert_ID (which is the primary key), concert_Name, Theme, Stadium_ID, and Year.\n",
      "\n",
      "4. Singer_in_Concert: This table acts as a junction table to establish a many-to-many relationship between the Singer and Concert tables. It includes the fields concert_ID and Singer_ID, which both make up the primary key for the table. These are also reference as foreign keys to their respective basetable: the Concert table via the concert_ID field, and the Singer table via the Singer_ID field.\n",
      "\n",
      "In summary, this database seems designed to track concerts, including details about singers, the concert's location (stadium), and which singer performed at which concert.\n"
     ]
    }
   ],
   "source": [
    "from loguru import logger\n",
    "from sql_utils import SQLPrompt\n",
    "describe_prompt = SQLPrompt().describe_schema(db_sqlite_path)\n",
    "sql_description = loaded_model([{\"role\": \"assistant\", \"content\": describe_prompt}]).text\n",
    "logger.chat(sql_description)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Setting up tool functions\n",
    "Then, we define the tools needed for React Agent to perform SQL query. Namely, our agent should be able to generate sql query given natural language and perform sql query to get the result.\n",
    "\n",
    "- We referenced a third-party Text-to-SQL [DAIL-SQL](https://github.com/BeachWang/DAIL-SQL) to generate Text-to-SQL prompt.\n",
    "- We use the query_sqlite service function in the agentscope.service module. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "from agentscope.service import (\n",
    "    ServiceResponse, \n",
    "    ServiceExecStatus, \n",
    "    ServiceToolkit, \n",
    "    query_sqlite,\n",
    ")\n",
    "from sql_utils import DailSQLPromptGenerator\n",
    "\n",
    "\n",
    "def generate_sql_query(question: str, db_path: str, model: Callable) -> ServiceResponse:\n",
    "    \"\"\"\n",
    "    Generate the sql query based on the natural language question from user.\n",
    "    \n",
    "    Args:\n",
    "        question (`str`):\n",
    "            The question asked by user in natural language.\n",
    "        db_path (`str`):\n",
    "            The path to the .sqlite file.\n",
    "        model (`Callable`):\n",
    "            The LLM model used to generate the SQL query.\n",
    "    \"\"\"\n",
    "    prompt_helper = DailSQLPromptGenerator(db_path)\n",
    "    prepared_prompt = prompt_helper.generate_prompt({\"content\": question})\n",
    "    \n",
    "    def get_response_from_prompt(prompt: dict, model: Callable) -> str:\n",
    "        \"\"\"\n",
    "        Generate response from prompt using LLM\n",
    "        \"\"\"\n",
    "        messages = [{\"role\": \"assistant\", \"content\": prompt}]\n",
    "        sql = model(messages).text\n",
    "        sql = \" \".join(sql.replace(\"\\n\", \" \").split())\n",
    "        sql = sql.strip().split(\"/*\")[0]\n",
    "        if sql.startswith(\"SELECT\"):\n",
    "            response = sql + \"\\n\"\n",
    "        elif sql.startswith(\" \"):\n",
    "            response = \"SELECT\" + sql + \"\\n\"\n",
    "        else:\n",
    "            response = \"SELECT \" + sql + \"\\n\"\n",
    "        return response\n",
    "\n",
    "    sql_response = get_response_from_prompt(\n",
    "        prepared_prompt[\"prompt\"], model=model\n",
    "    )\n",
    "\n",
    "    return ServiceResponse(\n",
    "        ServiceExecStatus.SUCCESS,\n",
    "        sql_response\n",
    "    )\n",
    "\n",
    "\n",
    "# Use Service Toolkit to set up tool functions for LLMs\n",
    "service_toolkit = ServiceToolkit()\n",
    "service_toolkit.add(generate_sql_query, db_path=db_sqlite_path, model=loaded_model)\n",
    "service_toolkit.add(query_sqlite, database=db_sqlite_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Create the ReAct agent to help perform SQL queries\n",
    "\n",
    "Now, we can init the ReAct Agent using the defined tools and interact with the agent."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from agentscope.agents import ReActAgent\n",
    "agent = ReActAgent(\n",
    "    name=\"assistant\",\n",
    "    model_config_name='gpt-4',\n",
    "    service_toolkit=service_toolkit,\n",
    "    sys_prompt=\"You are a helpful agent that preform SQL queries base on natual language instructions.\",\n",
    "    verbose=True, # set verbose to True to show the reasoning process\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4: Interact with the SQL ReAct Agent using natural lanugage\n",
    "\n",
    "Now let's see how the agent have response to our questions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "user: How many singers do we have?\n",
      "##################### ITER 1, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"To answer the user's question, I need to find out how many unique singers are listed in the database. I will use the generate_sql_query function to create a SQL query to count distinct singers.\",\n",
      "    \"speak\": \"Let me find this information for you.\",\n",
      "    \"function\": [\n",
      "        {\n",
      "            \"name\": \"generate_sql_query\",\n",
      "            \"arguments\": {\n",
      "                \"question\": \"How many unique singers are we have in the database?\"\n",
      "            }\n",
      "        }\n",
      "    ]\n",
      "}\n",
      "####################### ITER 1, STEP 2: ACTION #######################\n",
      ">>> Executing function generate_sql_query ...\n",
      ">>> END \n",
      "system: Execution Results:\n",
      "1. generate_sql_query:\n",
      "    [EXECUTE STATUS]: SUCCESS\n",
      "    [EXECUTE RESULT]: SELECT To find out how many unique singers are in the database, you would need to count the distinct singer IDs in the \"singer\" table. The appropriate SQL query for this would be: ```sql SELECT COUNT(DISTINCT Singer_ID) FROM singer; ```\n",
      "\n",
      "\n",
      "##################### ITER 2, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"The generate_sql_query function has provided the necessary SQL query. Now I will use the query_sqlite function to execute this query and retrieve the count of unique singers.\",\n",
      "    \"speak\": \"I'm now running the query to count the number of unique singers in the database.\",\n",
      "    \"function\": [\n",
      "        {\n",
      "            \"name\": \"query_sqlite\",\n",
      "            \"arguments\": {\n",
      "                \"allow_change_data\": false,\n",
      "                \"query\": \"SELECT COUNT(DISTINCT Singer_ID) FROM singer;\"\n",
      "            }\n",
      "        }\n",
      "    ]\n",
      "}\n",
      "####################### ITER 2, STEP 2: ACTION #######################\n",
      ">>> Executing function query_sqlite ...\n",
      ">>> END \n",
      "system: Execution Results:\n",
      "1. query_sqlite:\n",
      "    [EXECUTE STATUS]: SUCCESS\n",
      "    [EXECUTE RESULT]: [(6,)]\n",
      "\n",
      "##################### ITER 3, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"I successfully executed the query to count the distinct singers in the database and received the result.\",\n",
      "    \"speak\": \"We have 6 unique singers in the database.\",\n",
      "    \"function\": []\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "from agentscope.message import Msg\n",
    "\n",
    "mss = Msg(\n",
    "    name=\"user\", \n",
    "    content=\"How many singers do we have?\", \n",
    "    role=\"user\"\n",
    ")\n",
    "logger.chat(mss)\n",
    "\n",
    "sql_query_mss1 = agent(mss)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "user: Show all countries and the number of singers in each country.\n",
      "##################### ITER 1, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"To show all countries and the number of singers in each, I'll need to create and run an SQL query that counts singers by their country. This involves grouping the data by country and then counting the number of singers in each group.\",\n",
      "    \"speak\": \"Let me find out how many singers there are in each country for you.\",\n",
      "    \"function\": [\n",
      "        {\n",
      "            \"name\": \"generate_sql_query\",\n",
      "            \"arguments\": {\n",
      "                \"question\": \"Show all countries and the number of singers in each country.\"\n",
      "            }\n",
      "        }\n",
      "    ]\n",
      "}\n",
      "####################### ITER 1, STEP 2: ACTION #######################\n",
      ">>> Executing function generate_sql_query ...\n",
      ">>> END \n",
      "system: Execution Results:\n",
      "1. generate_sql_query:\n",
      "    [EXECUTE STATUS]: SUCCESS\n",
      "    [EXECUTE RESULT]: SELECT To show all countries and the number of singers in each country based on the database schema provided, you can use the SQL query below. This query groups singers by their country and counts the number of singers in each country: ```sql SELECT Country, COUNT(*) AS NumberOfSingers FROM singer GROUP BY Country ``` This SQL query selects the `Country` column from the `singer` table and uses the `COUNT(*)` function to count the total number of singers for each country. The `GROUP BY` clause groups the results by the `Country`, so you get the count of singers for each distinct country listed in the `singer` table.\n",
      "\n",
      "\n",
      "##################### ITER 2, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"The generate_sql_query function has provided the correct SQL query for showing all countries and the number of singers in each. Now, I need to execute this query using the query_sqlite function to get the desired data.\",\n",
      "    \"speak\": \"I'm now running the query to find out the number of singers in each country.\",\n",
      "    \"function\": [\n",
      "        {\n",
      "            \"name\": \"query_sqlite\",\n",
      "            \"arguments\": {\n",
      "                \"allow_change_data\": false,\n",
      "                \"query\": \"SELECT Country, COUNT(*) AS NumberOfSingers FROM singer GROUP BY Country\"\n",
      "            }\n",
      "        }\n",
      "    ]\n",
      "}\n",
      "####################### ITER 2, STEP 2: ACTION #######################\n",
      ">>> Executing function query_sqlite ...\n",
      ">>> END \n",
      "system: Execution Results:\n",
      "1. query_sqlite:\n",
      "    [EXECUTE STATUS]: SUCCESS\n",
      "    [EXECUTE RESULT]: [('France', 4), ('Netherlands', 1), ('United States', 1)]\n",
      "\n",
      "##################### ITER 3, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"The query has been executed successfully and I got the result indicating the number of singers in each country. Now I will present this data to the user.\",\n",
      "    \"speak\": \"We have 4 singers from France, 1 singer from Netherlands, and 1 singer from the United States.\",\n",
      "    \"function\": []\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "mss = Msg(\n",
    "    name=\"user\", \n",
    "    content=\"Show all countries and the number of singers in each country.\", \n",
    "    role=\"user\"\n",
    ")\n",
    "logger.chat(mss)\n",
    "\n",
    "sql_query_mss2 = agent(mss)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "user: How many singers have held concert in multiple stadiums? Give me the name of these singers.\n",
      "##################### ITER 1, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"To find out how many singers have held concerts in multiple stadiums and get their names, I need to create a SQL query that joins relevant tables to count the number of distinct stadiums for each singer. Singers with counts greater than one have held concerts in multiple stadiums. I will start by generating the SQL query.\",\n",
      "    \"speak\": \"Let me find out which singers have held concerts in multiple stadiums for you.\",\n",
      "    \"function\": [\n",
      "        {\n",
      "            \"name\": \"generate_sql_query\",\n",
      "            \"arguments\": {\n",
      "                \"question\": \"How many singers have held concerts in multiple stadiums and what are their names?\"\n",
      "            }\n",
      "        }\n",
      "    ]\n",
      "}\n",
      "####################### ITER 1, STEP 2: ACTION #######################\n",
      ">>> Executing function generate_sql_query ...\n",
      ">>> END \n",
      "system: Execution Results:\n",
      "1. generate_sql_query:\n",
      "    [EXECUTE STATUS]: SUCCESS\n",
      "    [EXECUTE RESULT]: SELECT s.Name, COUNT(DISTINCT c.Stadium_ID) FROM singer AS s JOIN singer_in_concert AS sic ON s.Singer_ID = sic.Singer_ID JOIN concert AS c ON sic.concert_ID = c.concert_ID GROUP BY s.Name HAVING COUNT(DISTINCT c.Stadium_ID) > 1\n",
      "\n",
      "\n",
      "##################### ITER 2, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"The generate_sql_query function successfully generated the necessary SQL query to find singers who have held concerts in multiple stadiums along with their names. I will now use the query_sqlite function to execute this SQL query and retrieve the required information.\",\n",
      "    \"speak\": \"I'm now running the query to find out which singers have performed in multiple stadiums.\",\n",
      "    \"function\": [\n",
      "        {\n",
      "            \"name\": \"query_sqlite\",\n",
      "            \"arguments\": {\n",
      "                \"allow_change_data\": false,\n",
      "                \"query\": \"SELECT s.Name, COUNT(DISTINCT c.Stadium_ID) FROM singer AS s JOIN singer_in_concert AS sic ON s.Singer_ID = sic.Singer_ID JOIN concert AS c ON sic.concert_ID = c.concert_ID GROUP BY s.Name HAVING COUNT(DISTINCT c.Stadium_ID) > 1\"\n",
      "            }\n",
      "        }\n",
      "    ]\n",
      "}\n",
      "####################### ITER 2, STEP 2: ACTION #######################\n",
      ">>> Executing function query_sqlite ...\n",
      ">>> END \n",
      "system: Execution Results:\n",
      "1. query_sqlite:\n",
      "    [EXECUTE STATUS]: SUCCESS\n",
      "    [EXECUTE RESULT]: [('John Nizinik', 2), ('Justin Brown', 3), ('Timbaland', 2), ('Tribal King', 2)]\n",
      "\n",
      "##################### ITER 3, STEP 1: REASONING ######################\n",
      "assistant: {\n",
      "    \"thought\": \"The SQL query has been executed successfully and the resulting dataset includes all singers who have held concerts in multiple stadiums.\",\n",
      "    \"speak\": \"The singers who have held concerts in multiple stadiums are John Nizinik, Justin Brown, Timbaland, and Tribal King.\",\n",
      "    \"function\": []\n",
      "}\n"
     ]
    }
   ],
   "source": [
    "mss = Msg(\n",
    "    name=\"user\", \n",
    "    content=\"How many singers have held concert in multiple stadiums? Give me the name of these singers.\", \n",
    "    role=\"user\"\n",
    ")\n",
    "logger.chat(mss)\n",
    "sql_query_mss3 = agent(mss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use the following code to build a conversation with the ReAct agent:\n",
    "\n",
    "``` python\n",
    "from agentscope.agents import UserAgent\n",
    "\n",
    "user = UserAgent(name=\"User\")\n",
    "\n",
    "x = None\n",
    "while True:\n",
    "    x = user(x)\n",
    "    if x.content == \"exit\":\n",
    "        break\n",
    "    x = agent(x)\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The capabilities of the ReAct agent are indeed very powerful. By modifying prompts and expanding the corresponding usage tools, the ReAct agent can leverage the combination of LLM and tools to accomplish more complex tasks.\n",
    "\n",
    "We encourage users to delve deeper into exploring the Agent's functionalities and to try out the capabilities of the LLM agent on their own."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
